spectral sum
Quantum algorithms for spectral sums
Luongo, Alessandro, Shao, Changpeng
The trace of matrix function, far from being only of theoretical interest, appears in many practical applications of linear algebra. To name a few, it has applications in machine learning, computational chemistry, biology, statistics, finance, and many others [1, 6, 13, 14, 24, 26, 31, 49, 52, 53]. While the problem of estimating some spectral quantities dates back to decades, many fast classical algorithms have been developed recently [7, 28, 29, 38, 47, 58, 61], highlighting the importance of spectral sums in many numerical problems. The spectral sum is defined as the sum of the eigenvalues of a matrix after a given function is applied to them. Oftentimes, the matrix will be symmetric positive definite (SPD), but there are cases where this assumption is relaxed. As an example, the logarithm of the determinant is perhaps the most common example of spectral sum, as the determinant is one of the most important properties associated with a matrix. However, the standard definition does not offer an efficient way of computing it. Remarkably, it is often the case that the logarithm of the determinant is the quantity that is effectively needed in the applications, which is much more amenable to estimation.
Optimizing Spectral Sums using Randomized Chebyshev Expansions
Han, Insu, Avron, Haim, Shin, Jinwoo
The trace of matrix functions, often called spectral sums, e.g., rank, log-determinant and nuclear norm, appear in many machine learning tasks. However, optimizing or computing such (parameterized) spectral sums typically involves the matrix decomposition at the cost cubic in the matrix dimension, which is expensive for large-scale applications. Several recent works were proposed to approximate large-scale spectral sums utilizing polynomial function approximations and stochastic trace estimators. However, all prior works on this line have studied biased estimators, and their direct adaptions to an optimization task under stochastic gradient descent (SGD) frameworks often do not work as accumulated biased errors prevent stable convergence to the optimum. To address the issue, we propose the provable optimal unbiased estimator by randomizing Chebyshev polynomial degrees. We further introduce two additional techniques for accelerating SGD, where key ideas are on sharing randomness among many estimations during the iterative procedure. Finally, we showcase two applications of the proposed SGD schemes: matrix completion and learning Gaussian process, under the real-world datasets.